The Application of TD(λ) Learning to the Opening Games of 19×19 Go
نویسندگان
چکیده
This paper describes the results of applying Temporal Difference (TD) learning with a network to the opening game problems in Go. The main difference from other research is that this experiment applied TD learning to the fullsized (19×19) game of Go instead of a simple version (e.g., 9×9 game). We discuss and compare TD(λ) learning for predicting an opening game’s winning and for finding the best game among the prototypical professional opening games. We also tested the performance of TD(λ)s by playing against each other and against the commercial Go programs. The empirical result for picking the best game is promising, but there is no guarantee that TD(λ) will always pick the identical opening game independent of different λ values. The competition between two TD(λ)s shows that TD(λ) with a higher λ has better performance.
منابع مشابه
Honte, a Go-Playing Program Using Neural Nets
The go-playing program Honte is described. It uses neural nets together with more conventional AI-methods like alpha-beta search. A neural net is trained by supervised learning to imitate local shapes made in a database of expert games. A second net is trained to estimate the safety of groups by self play using TD(λ)learning. A third net is trained to estimate territorial potential of unoccupie...
متن کاملThe effect of Electronic puzzle Games on Improving Reading Performance of Students with learning disorders
The purpose of this study was to investigate the effect of Electronic puzzle Games on improving the reading performance of students with learning disorders in Gorgan. The research was applied to the target and quasi-experimental, pre-test and post-test design with control group. The statistical population consisted of 255 female students of Shokufa and Dena provincial centers and non-profit cen...
متن کاملActive Opening Book Application for Monte-Carlo Tree Search in 19×19 Go
The dominant approach for programs playing the Asian board game of Go is nowadays Monte-Carlo Tree Search (MCTS). However, MCTS does not perform well in the opening phase of the game, as the branching factor is high and consequences of moves can be far delayed. Human knowledge about Go openings is typically captured in joseki, local sequences of moves that are considered optimal for both player...
متن کاملTDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search
In this paper we present TDLeaf(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(λ) and another less radical variant, TD-directed(λ). In particular, our chess program, " KnightCap, " used TDLeaf(λ) to learn its evaluation fun...
متن کاملLearning to play chess using TD(λ)-learning with database games
In this paper we present some experiments in the training of different evaluation functions for a chess program through reinforcement learning. A neural network is used as the evaluation function of the chess program. Learning occurs by using TD(λ)-learning on the results of high-level database games. Experiments are performed with different classes of features and neural network architectures....
متن کامل